library(tseries) #for base ts stuff
library(forecast) #for holtz-winters stuff
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.3
library(TTR)
library(stats)
PPT_mth<-read.csv("/Users/sonia/Documents/NRES 746/TimeSeries/retimeseriesgroup/PRISM_ppt_1895-2015Mo2.csv", head=T, skip=10)
#list(PPT_mth)
str(PPT_mth) # looking at the structure of the data
## 'data.frame': 876 obs. of 2 variables:
## $ Date : Factor w/ 876 levels "1895-01","1895-02",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ ppt..inches.: num 2.35 0.96 0.31 0.12 1.32 0.61 5.73 3.52 1.99 1.74 ...
ts(PPT_mth$ppt..inches,start=c(1895,1), frequency =12, end=c(1968)) ->pptimeseries # reading in the data in a time series format
str(pptimeseries) # looking at the structure of ppt
## Time-Series [1:877] from 1895 to 1968: 2.35 0.96 0.31 0.12 1.32 0.61 5.73 3.52 1.99 1.74 ...
plot.ts(pptimeseries, main="Monthly precipitation", xlab="Years", ylab= "Precipitation (inches)") # Plotting the timeseries
# Decomposition of the additive time series
pptimeseriescomponents<-decompose(pptimeseries)
# Plotting the trend, seasonal and irregular(random) components
plot(pptimeseriescomponents)
#Using HoltWinters to predict timeseries
HoltWinters(pptimeseries) -> pptimeseries2 # predicting using HoltWinters() function
plot(pptimeseries2, lwd=2) # Plotting the forecast against the original data
legend("topleft", col=c("black","red"), c("Observed", "Fitted"), lty=1,cex = 0.8)
# To make foreast for furture times not included in original time series
pptimeseries3 <-forecast.HoltWinters(pptimeseries2, h=48)
plot.forecast(pptimeseries3)
# Investigation whether the model can be improved upon
#auto-corrillation function
acf(pptimeseries3$residuals, na.action=na.pass, lag.max=20)
#The in-sample forecast errors are stored in the named element "residuals" of the list variable returned by forecast.HoltWinters().
Box.test(pptimeseries3$residuals, lag=20, type="Ljung-Box")
##
## Box-Ljung test
##
## data: pptimeseries3$residuals
## X-squared = 19.883, df = 20, p-value = 0.4652
# make a time plot to look at variance
plot.ts(pptimeseries3$residuals)
na.remove(pptimeseries3$residuals) -> PPT_S_R
# Function to check if errors are normally distributed
plotForecastErrors <- function(PPT_S_R)
{
# make a histogram of the forecast errors:
mybinsize <- IQR(PPT_S_R)/4
mysd <- sd(PPT_S_R)
mymin <- min(PPT_S_R) - mysd*5
mymax <- max(PPT_S_R) + mysd*3
# generate normally distributed data with mean 0 and standard deviation mysd
mynorm <- rnorm(10000, mean=0, sd=mysd)
mymin2 <- min(mynorm)
mymax2 <- max(mynorm)
if (mymin2 < mymin) { mymin <- mymin2 }
if (mymax2 > mymax) { mymax <- mymax2 }
# make a red histogram of the forecast errors, with the normally distributed data overlaid:
mybins <- seq(mymin, mymax, mybinsize)
hist(PPT_S_R, col="red", freq=FALSE, breaks=mybins)
# freq=FALSE ensures the area under the histogram = 1
# generate normally distributed data with mean 0 and standard deviation mysd
myhist <- hist(mynorm, plot=FALSE, breaks=mybins)
# plot the normal curve as a blue line on top of the histogram of forecast errors:
points(myhist$mids, myhist$density, type="l", col="blue", lwd=2)
}
# make a histogram
plotForecastErrors(PPT_S_R)
Run this code with the temperature dataset attached and pay attention to the forecast confidence intervals. Why does it perform better than it did on the precipitation dataset?
Try to run scale() or BoxCox.lambda() to transform data to non-negative forecast values.
Is Holt-Winter appropriate for forecasting with non-negative value data sets? Why or why not.
Two Parts
Autoregressive Model (AR): It creates a regression model using previous values from the time series. It applies a “moving window” where the response of the previous predictor data becomes the predictor data for the next group.
Moving Average (MA): A smoothing technique to reduce the effects of random variation.
(Can also have a seasonal component.)
*As with all models, only include the necessary terms
data(co2)
str(co2)
## Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ...
plot(co2)
Autocorrelation Plot
Yes, there is a trend.
Differencing (ie. subtracting mean) or fitting a curve and subtracting the fitted values
detrend(): computes the least-squares fit of a straight line to the data and subtracts the resulting function from the data.
# subtracting yearly mean
yearlymean <- ave(co2, gl(39, 12), FUN = mean)
co2.dt <- co2 - yearlymean
# blackbox way to do it but not necessary
# co2.dt <- pracma::detrend(co2.m, tt = "constant")
# co2.dt <- ts(as.numeric(co2.dt), start = c(1959, 1), frequency = 12)
# will get the same result as above
plot(co2.dt)
Seasonal Subseries Plot
monthplot(co2.dt, ylab = "data")
Autocorrelation Plot
acf(co2.dt, type = "correlation")
Yes, there is seasonality.
Seasonal differencing
diff(): takes timeseries (ts), applies a given lag and returns lagged differences for each value in the timeseries.
# differencing once
co2.dt.dif <- diff(co2.dt,lag = 12)
plot(co2.dt.dif)
Check diagnostic plots again.
Seasonal Subseries Plot
monthplot(co2.dt.dif, ylab = "data")
Autocorrelation Plot: Still some seasonality showing up in this plot
acf(co2.dt.dif, type = "correlation")
Based on the autocorrelation plot, model will need a seasonal component to address strong seasonality with differencing.
CAVEAT: Be judicious with differencing more than once, you can run into issues of over correcting.
This dataset has equal variance.
Use natural log transformation
Seasonal Trend Decomposition using loess
Loess: a non-parametric regression method using multiple regression models, k-nearest-neighbor-based meta-model
stl(): Decomposes time series into seasonal, trend and irregular components using loess
co2.stl <- stl(co2,s.window = "periodic")
plot(co2.stl)
# black box?
Assess the shape of the autocorrelation or partial autocorrelation plots to determine model.
Based on the autocorrelation plot…
acf(co2.dt.dif, type = "correlation")
ARIMA model with possible seasonal autocorrelation
Syntax looks like: AR(1) with seasonal AR(12) The autoregression term has a lag of 1 because that is when the autocorrelation plot drops off, and seasonal autocorrelation has a lag of 12 because the seasonality is a 12 month cycle.
In base R:
arima(): fit an ARIMA model to a timeseries dataset
arima0(): same as arima() with the added ability to use the predict() function to predict values out into the future
# ARIMA(1,0,0)[12]
# without seasonality on fully differenced data
results <- arima0(co2.dt.dif, order = c(1,0,0), include.mean = FALSE)
# exclude mean because estimation of mean is 0, since mean was taken out in the detrending
# ARIMA(1,0,0)(0,1,1)[12]
# with seasonality on detrended data with seasonality
resultsseason <- arima0(co2.dt, order = c(1,0,0), seasonal = c(0,1,1), include.mean = FALSE)
# order and season = c(p,d,q) c(AR, degree of differencing, MA order)
results
##
## Call:
## arima0(x = co2.dt.dif, order = c(1, 0, 0), include.mean = FALSE)
##
## Coefficients:
## ar1
## 0.3663
## s.e. 0.0438
##
## sigma^2 estimated as 0.1175: log likelihood = -158.92, aic = 321.84
resultsseason
##
## Call:
## arima0(x = co2.dt, order = c(1, 0, 0), seasonal = c(0, 1, 1), include.mean = FALSE)
##
## Coefficients:
## ar1 sma1
## 0.3809 -0.8609
## s.e. 0.0436 0.0020
##
## sigma^2 estimated as 0.07152: log likelihood = -53.58, aic = 113.15
acf(results$residuals)
acf(resultsseason$residuals)
The ARIMA model with the seasonal componant does much better when comparing the AICs as we would expect.
Another option in forecast package:
auto.arima(): takes a timeseries (ts) and calculates the best arima based on the AIC, AICc and BIC.
resultsauto <- forecast::auto.arima(co2)
resultsauto
## Series: co2
## ARIMA(1,1,1)(1,1,2)[12]
##
## Coefficients:
## ar1 ma1 sar1 sma1 sma2
## 0.2569 -0.5847 -0.5489 -0.2620 -0.5123
## s.e. 0.1406 0.1203 0.5881 0.5703 0.4820
##
## sigma^2 estimated as 0.08576: log likelihood=-84.39
## AIC=180.78 AICc=180.97 BIC=205.5
acf(resultsauto$residuals)
Be careful! The computor has a tendency to overfit. It is important to look at the number of terms in the equation and make sure some have not cancelled others out. In this case, the seasonal MA and the none seasonal AR introduce opposite correlations effectively canceling each other out. It is definately important to unpack the black box.
# plot
fitted.co2.dt <- co2.dt - resultsseason$residuals
ts.plot(co2.dt, fitted.co2.dt, col = 1:2)
# on original data
fitted.co2 <- fitted.co2.dt + yearlymean
ts.plot(co2, fitted.co2, col = 1:2)
R will deal with trends, seasonality and unequal variance with arima0(). Make sure you know which terms to put in the model first.
final <- arima0(co2, order = c(1,1,0), seasonal = c(0,1,1))
acf(final$residuals)
co2.fitted <- co2 - final$residuals
predict.final <- predict(final,n.ahead = 36,se.fit = FALSE)
ts.plot(co2, co2.fitted, predict.final, col = 1:3)
# and standard error
# from which you can get confidence intervals
predict.final.se <- predict(final,n.ahead = 36)
predict.final.se$se
## Jan Feb Mar Apr May Jun Jul
## 1998 0.2890782 0.3541776 0.4219255 0.4768103 0.5268911 0.5723680 0.6145538
## 1999 0.8352279 0.8724731 0.9091511 0.9441348 0.9779441 1.0106014 1.0422421
## 2000 1.2239645 1.2563289 1.2886397 1.3199425 1.3505820 1.3805239 1.4098351
## Aug Sep Oct Nov Dec
## 1998 0.6540063 0.6912155 0.7265201 0.7601873 0.7924252
## 1999 1.0729483 1.1028004 1.1318653 1.1602023 1.1878635
## 2000 1.4385477 1.4666988 1.4943195 1.5214389 1.5480833
ARIMA models are tricky to build. It is easy to overfit the mode. Statistical experience definately helps.
It is important to open the black box and fully understand what the model is doing.
Diagonostics are critical, but open to interpretation.
head(austres) # Numbers (in thousands) of Australian residents measured quarterly from March 1971 to March 1994. The object is of class "ts".
## [1] 13067.3 13130.5 13198.4 13254.2 13303.7 13353.9
Run the appropriate tests and build an ARIMA model for it. How confident are you in the ARIMA model? What are some drawbacks to using the model? What are some advantages?